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CENTRAL LIMIT THEOREM FOR SEQUENTIAL MONTE CARLO 
METHODS AND ITS APPLICATION TO BAYESIAN INFERENCE 

By Nicolas Chopin 

Bristol University 

The term "sequential Monte Carlo methods" or, equivalently, 
"particle filters," refers to a general class of iterative algorithms that 
performs Monte Carlo approximations of a given sequence of distri- 
butions of interest (vrt). We establish in this paper a central limit 
theorem for the Monte Carlo estimates produced by these compu- 
tational methods. This result holds under minimal assumptions on 
the distributions nt, and applies in a general framework which en- 
compasses most of the sequential Monte Carlo methods that have 
been considered in the literature, including the resample-move algo- 
rithm of Gilks and Berzuini [J. R. Stat. Soc. Ser. B Stat. Methodol. 
63 (2001) 127-146] and the residual resampling scheme. The corre- 
sponding Eisymptotic variances provide a convenient measurement of 
the precision of a given particle filter. We study, in particular, in some 
typical examples of Bayesian applications, whether and at which rate 
these asymptotic variances diverge in time, in order to assess the long 
term reliability of the considered algorithm. 

1. Introduction. Sequential Monte Carlo methods form an emerging, yet 
already very active branch of the Monte Carlo paradigm. Their growing 
popularity comes in part from the fact that they are often the only viable 
computing techniques in those situations where data must be processed se- 
quentially. Their range of applicability is consequently very wide, and in- 
cludes nonexclusively signal processing, financial modeling, speech recog- 
nition, computer vision, neural networks, molecular biology and genetics, 
target tracking and geophysics, among others. A very good introduction to 
the field has been written by Kiinsch (2001), while the edited volume of 
Doucet, de Freitas and Gordon (2001) provides an interesting coverage of 
recent developments in theory and applications. 
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Specifically, sequential Monte Carlo methods (alternatively termed "par- 
ticle filters" or "recursive Monte Carlo filters" ) are iterative algorithms that 
produce and update recursively a set of weighted simulations (the "par- 
ticles") in order to provide a Monte Carlo approximation of an evolving 
distribution of interest 'Kt{d9t), t being an integer index. In a sequential 
Bayesian framework, TTt{d6t) will usually represent the posterior distribu- 
tion of parameter 9t given the t first observations. The term "parameter" 
must be understood here in a broad sense, in that 9t may include any un- 
known quantity which may be inferred from the t first observations, and 
is not necessarily of constant dimension. We denote by Gj the support of 

The study of the asymptotic properties of sequential Monte Carlo methods 
is admittedly a difficult problem, and some methodological papers [Liu and Chen 

(1998) , e.g.] simply state some form of the law of large numbers for the 
most elaborate algorithms, that is, the Monte Carlo estimates are shown 
to converge almost surely to the quantity of interest as H, the number 
of particles, tends toward infinity. More refined convergence results have 
been obtained, such as the central limit theorem of Del Moral and Guionnet 

(1999) , later completed by Del Moral and Miclo (2000), or upper bounds 
for the Monte Carlo error expressed in various norms [Crisan and Lyons 
(1997, 1999), Crisan, Gaines and Lyons (1998), Crisan and Doucet (2000), 
Del Moral and Guionnet (2001), Kiinsch (2001) and Le Gland and Gudjane 
(2004)]. Unfortunately, it has been, in general, at the expense of generality 
[with the exception of Crisan and Doucet (2000)], whether in terms of com- 
putational implementation (only basic algorithms are considered, which may 
not be optimal) or of applicability (the sequence nt has to be generated from 
some specific dynamical model that fulfills various conditions). 

In this paper we derive a central limit theorem that applies to most of the 
sequential Monte Carlo techniques developed recently in the methodologi- 
cal literature, including the resample-move algorithm of Gilks and Berzuini 

(2001) , the auxiliary particle filter of Pitt and Shephard (1999) and the 
stochastic remainder resampling scheme [Baker (1985, 1987)], also known 
as the residual resampling scheme [Liu and Chen (1998)]. No assumption is 
made on the model that generates the sequence of distributions of interest 
(iTt), so that our theorem equally applies to those recent algorithms [Chopin 

(2002) , Del Moral and Doucet (2002) and Cappe, Guilhn, Marin and Robert 
(2004)] that have been developed for contexts that widely differ from the 
standard application of sequential Monte Carlo methods, namely, the se- 
quential analysis of state space models. 

The appeal of a central limit theorem is that it provides an (asymptot- 
ically) exact measure of the Monte Carlo error, through the asymptotic 
variance. This allows for a rigorous comparison of the relative efficiency 
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of given algorithms. In this way, we show in this paper, again by com- 
paring the appropriate asymptotic variances, that the residual resampling 
scheme always outperforms the multinomial resampling scheme, and that the 
Rao-Blackwell variance reduction technique of Doucet, Godsill and Andrieu 
(2000) is, indeed, effective. 

The most promising application of our central limit theorem is the pos- 
sibility to assess the stability of a given particle filter (in terms of preci- 
sion of the computed estimates) through the time behavior of the corre- 
sponding asymptotic variances. This is a critical issue since it is well known 
that sequential Monte Carlo methods tend to degenerate in a number of 
cases, sometimes at a very fast rate. We consider in this paper some typical 
Bayesian problems, such as the sequential analysis of state-space models. 
We will show that under some conditions stability can be achieved at least 
for "filtering" the states, that is, for approximating the marginal posterior 
density TTt{xt), where xt stands for the current state at iteration t. 

The paper is organized as follows. Section 2 proposes a generic description 
of particle filters, establishes a central limit theorem for computed estimates 
in a general framework and draws some conclusions from this result. Sec- 
tion 3 discusses the stability of particle filters through the time behavior of 
the asymptotic variances provided by the central limit theorem. Proofs of 
theorems are put in the Appendix. 



2. Central limit theorem for particle filters. 



2.1. General formulation of particle filters. In full generality, a particle 
system is a triangular array of random variables in B x M"*", 

where is some space of interest. The variables 6^^'^^ are usually called 
"particles," and their contribution to the sample may vary according to 
their weights w^^'^\ We will say that this particle system targets a given 
distribution vr defined on if and only if 

m Ef=i^(^-'^V(^(^'^)) ^ , . 

T^^,n,im 

holds almost surely as if — > +cx3 for any measurable function (p such that the 
expectation above exists. A first example of a particle system is a denumer- 
able set of independent draws from vr, with unit weights, which obviously 
targets vr. In this simple case, particles and weights do not depend on H, 
and the particle system is a sequence rather than a triangular array. This 
is not the case in general, however, and, while cumbersome, the dependence 
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in H will be maintained in notation to allow for a rigorous mathematical 
treatment. 

Now assume a sequence (vrj)tgN of distributions defined on a sequence of 
probabilized spaces {Qt)- In most, if not all, applications, Qt will be a power 
of the real line or some subset of it, and, henceforth, 7Tt{-) will also denote 
the density of vrj with respect to an appropriate version of the Lebesgue 
measure. A sequential Monte Carlo algorithm (or particle filter) is a method 
for producing a particle system whose target evolves in time: at iteration t of 
the algorithm, the particle system targets TTt, and therefore allows for Monte 
Carlo approximations of the distribution of (current) interest TTt- Clearly, 
particle filters do not operate in practice on infinite triangular arrays but 
rather manipulate particle vectors of fixed size H. One must keep in mind, 
however, that the justification of such methods is essentially asymptotic. 

The structure of a particle filter can be decomposed into three basic itera- 
tive operations, that will be referred to hereafter as mutation, correction and 
selection steps. At the beginning of iteration t, consider a particle system 

{Ot-^\ l)j<_ff, that is, with unit weights, which targets irt-i- The mutation 
step consists in producing new particles drawn from 

where kt is a transition kernel which maps @t-i into 'P{Qt), the set of prob- 
ability measures on Qf. The "mutated" particles (with unit weights) target 
the new distribution Trt{-) = J ■Kt-i{9t~i)kt{6t-i,-) dOt-i- This distribution 
TTt is usually not relevant to the considered application, but rather serves 
as an intermediary stage for practical reasons. To shift the target to the 
distribution of interest vrt, particles are assigned weights 

w[''''^ oc vMt'''^) with vt{-) = 

This is the correction step. The particle system {6[^'^\w[^'^^)j<H targets vrj. 
The function vt is referred to as the weight function. Note that the normaliz- 
ing constants of the densities vrt and Tr^ are intractable in most applications. 
This is why weights are defined up to a multiplicative constant, which has 
no bearing anyway on the estimates produced by the algorithm, since they 
are weighted averages. 

Finally, the selection step consists in replacing the current vector of par- 
ticles by a new, uniformly weig hted vector {6[^'^\l) i<H, which contains a 

number n^^'^^ of replicates of particle 6[^'^\ n^^'^^ > 0. The n^^'^^^s are 
random variables such that n^^'^^ = H and ¥.{71^^'^^) = Hpj, where the 
normalized weights are given by 
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and where dependencies in H and t are omitted for convenience. In this 
way, particles whose weights are too small are discarded, while particles 
with important weights serve as multiple starting points for the next muta- 
tion step. There are various ways of generating the n^^'^^^s. Multinomial 
resampling [Gordon, Salmond and Smith (1993)] amounts to drawing in- 
dependently the H new particles from the multinomial distribution which 
produces ^p'^^ with probability pj. Residual resampling [originally termed 
"stochastic remainder sampling" in the genetic algorithm literature. Baker 
(1985, 1987), then rediscovered by Liu and Chen (1998)] consists in repro- 
ducing \_Hpj\ times each particle 9\^'^\ where [-J stands for the integer 
part. The particle vector is completed by H"^ = H — J2j [Hpj\ independent 

draws from the multinomial distribution which produces o[^'^^ with proba- 
bility {Hpj — \_H pj\) / H"^ . Systematic resampling [another method initially 
proposed in the genetic algorithm field, Whitley (1994), then rediscovered by 
Carpenter, Clifford and Fearnhead (1999); see also Crisan and Lyons (2002) 
for a slightly different algorithm] is another interesting selection scheme, 
which is such that the number of replicates ensured to differ from 

Hpj by at most one. We failed, however, to extend our results to this third 
selection scheme. 

The structure of a particle filter can be summarized as follows: 

1. Mutation: Draw for j = 1, . . . ,H, 

0p'^)~fc,(^pjf,d^,), 

where kt ■ @t-i — > V{Qt) is a given probability kernel. 

2. Correction: Assign weights to particles so that, for j = 1, . . . , H , 

oc.,(^F'-))=vr,(^?-'-))M(^F'"^), 

where TTt{-) = j T:t~i{9t-i)kt{0t-i,-) dOt-i- 

3. Selection: Resample, according to a given selection scheme, 

The first mutation step, t = 0, is assumed to draw independent and iden- 
tically distributed particles from some instrumental distribution vfo. 

It is shown without difficulty that the particle system produced by this 
generic algorithm does iteratively target the distributions of interest, that 
is, the following convergences hold almost surely: 

i7-if;y.(#^))-E^,(^), 
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as H ^ +00, provided these expectations exist. These convergences will be 
referred to as the law of large numbers for particle filters. 

2.2. Some examples of particle filters. The general formulation given in 
the previous section encompasses most of the sequential Monte Carlo algo- 
rithms described in the literature. By way of illustration, assume first that 
the distributions nt are defined on a common space, 0t = O. In a Bayesian 
framework, irt will usually be the posterior density of 6, given the t first 
observations, 7Tt{9) = 7r{6\yi:t), where yi-t denotes the sequence of observa- 
tions 2/1, .. . ,yt- If particles are not mutated, kt being the "identity kernel" 
kt{9, ■) = 80, we have vTf = nt-i for t > 0, and our generic particle filter be- 
comes one of the variations of the sequential importance resampling algo- 
rithm [Rubin (1988), Gordon, Salmond and Smith (1993) and Liu and Chen 
(1998)]. The weight function simplifies to 

vt{9) = TT{6\yi:t)/7^{0\yi:t^i) (xp{yt\yi:t^i,0) 

in a Bayesian model, where p{yt\yi -.1-1,9) is the conditional likelihood of yt, 
given the parameter 9 and previous observations. 

Gilks and Berzuini (2001) propose a variant of this algorithm, namely, 
the resample-move algorithm, in which particles are mutated according to 
an MCMC [Markov chain Monte Carlo; see, e.g., Robert and Casella (1999)] 
kernel kt, which admits irt-i as an invariant density. In that case, we still 
have TTt = T^t-i, and the expression for the weight function vt is unchanged. 
The motivation of this strategy is to add new particle values along iterations 
so as to limit the depletion of the particle system. 

Now consider the case where irt is defined on a space of increasing dimen- 
sion of the form 0^ = A"*. A typical application is the sequential inference of 
a dynamical model which involves a latent process (xt), and irt stands then 
for density Tr{xi:t\yi:t)- Assume kt can be decomposed as 

kt{xl . t-i,dxi ■ t) = Kt{x*i , t-i,dxi ; t-i)qtixt\xi ■ t-i) dxt, 

where Kt:X^~^ —fV{X^~^) is a transition kernel, and qt{-\-) is some con- 
ditional probability density. If Kt admits vrt-i as an invariant density, the 
weight function is given by 

(2) Vt[Xi:t)- 



7rt-i{xi;t-i)qt{xt\xi:t-i) ' 



Again, the case where Kt is the identity kernel corresponds to some version of 
the sequential importance resampling algorithm, while setting Kt to a given 
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MCMC transition kernel with invariant density irt-i leads to the resample- 
move algorithm of Gilks and Berzuini (2001). The standard choice for qt{-\-) 
is the conditional prior density of xt, given xi:t-i, as suggested originally 
by Gordon, Salmond and Smith (1993), but this is not always optimal, as 
pointed out by Pitt and Shephard (1999) and Doucet, Godsill and Andrieu 
(2000). In fact, it is generally more efficient to build some conditional density 
qt which takes into account the information carried by yt in some way, in 
order to simulate more values compatible with the observations. 

These two previous cases can be combined into one, by considering a 
dynamical model which features at the same time a fixed parameter 6 and 
a sequence of latent variables (xt), so that 0t = x A"*, and irt stands for 
the joint posterior density 7r{0,xi:t\yi:t)- 

2.3. Central limit theorem. The following quantities will play the role 
of asymptotic variances in our central limit theorem. Let, for any measur- 
able ip:@o —>-^'^, Vo{ip) = Var^fQ((^), and by induction, for any measurable 

(3) 14(v9) = yt_i{Efc, ((/.)} +E^,_,{Varfe,((^)}, t > 0, 

(4) Vti^) = Vt{vfi^-E^,ip)}, t>0, 

(5) Vti^) = Vt{^)+\a.iM, t>0. 

The notation ¥.kt{^p) and Yarkf {(p) is shorthand for the functions ^{6t~i) = 
^kt{et-i,-)W{-)} and T^iOt-i) = \aikt(et-i,-)W{-)}, respectively Note that these 
equations do not necessarily produce finite variances for any ip. We now spec- 
ify the classes of functions for which the central limit theorem enunciated 
below will hold, and, in particular, for which these asymptotic variances ex- 
ist. Denoting by || • || the Euclidean norm in W^, we define recursively ^^f^ 
to be the set of measurable functions : 6t — > M*^ such that for some 5 > 0, 

(6) E^Ji;^ •(^f+^<+cx), 

and that the function 9t-i i— > E/jj(£)j_^ .■){t;t(-)(^(-)} is in The initial set 

$q'^'* contains all the measurable functions whose moments of order two with 
respect to vro are finite. 

Theorem 1. // the selection step consists of multinomial resampling, 

and provided that the unit function 6t'—^l belongs to ^[^^ for every t, then 

for any ip G ¥,■,^^{^p), 14 (v) o.nd Vt{^p) are finite quantities, and the 

following convergences in distribution hold as H ^ +00: 
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A proof is given in the Appendix. In the course of the proof an addi- 
tional central limit theorem is established for the unweighted particle system 
{6\^'^\ 1) produced by the mutation step, which targets vrf. This result is not 
given here, however, for it holds for a slightly different class of functions, 
and is of less practical interest. The assumption that the function ^ 
belongs to $1^^ deserves further comment. Qualitatively, it implies that the 
weight function vt has finite moment of order 2 + 5 with respect to vf^, for 
some 5 > 0, and, therefore, restricts somehow the dispersion of the parti- 
cle weights. It also implies that contains all bounded functions if. In 
practice this assumption will be fulfilled, for instance, whenever each weight 
function vt is bounded from above, which occurs in many practical settings. 

A central limit theorem also holds when the selection step follows the 
residual sampling scheme of Liu and Chen (1998), but this imposes some 
change in the expression for the asymptotic variances. The new expression 
for Vt{(p) is 

(7) Vt{ip) = Vt{v) + Rt{^), 
where 

(8) Rt{v)=¥.MvtW] - ^ [¥.Mvt)H^mMyt)^)t 
and r{x) is x minus its integer part. 

Theorem 2. The results of Theorem 1 still hold when the selection steps 
consists of residual resampling, except that the asymptotic variances are now 
defined by equations (3), (4) and (7). 

The proofs of Theorems 1 and 2 (given in the Appendix) rely on an induc- 
tion argument: conditional on past iterations, each step generates indepen- 
dent (but not identically distributed) particles, which follow some (condi- 
tional) central limit theorem. In contrast, the systematic resampling scheme 
is such that, given the previous particles, the new particle system is entirely 
determined by a single draw from a uniform distribution; see Whitley (1994). 
This is why extending our results to this third selection scheme seems not 
straightforward, and possibly requires an entirely diff'erent approach. 

The appeal of the recursive formulae (3)-(5) and (7) is that they put 
forward the impact of each new step on the asymptotic variance, particularly 
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the additive effect of the selection and mutation steps. In the multinomial 
case, an alternative expression for the asymptotic variance is 

t 

(9) Vtiv) = Es, [vlSk+l : tW - iv)}Sk+l : tW - ^n, i^)}'], 

k=0 

where £t is the functional operator which associates to ip the function 

(10) £t{ip):et^i^E,^^g^_^^.^{vt{-M-)}, 

and £k+i:tif) = £k+i o ■ ■ ■ o £t{^) for k + l<t, £t+i:tiv) = f. This closed 
form expression is more convenient when studying the stability of the asymp- 
totic variance over time, as we will illustrate in the next section. A similar 
formula for the residual case can be obtained indirectly by deriving the 
difference between the multinomial and the residual cases, that is, for t > 0, 

t-i 

(11) Vliif) - Vt{^) = [Rk{£k+i:t{v^)} - Var^J5fc+i.i((^)}], 

k=0 

where Vt{ip), V{{ip) are defined through the recursions (3)-(5) and (3), 
(4) and (7), respectively. In the following, we will similarly distinguish the 
residual case through an r-sufhx in notation. 

2.4. First conclusions. A first application of this central limit theorem is 
to provide a rigorous justification for some heuristic principles that have been 
stated in the literature, see, for instance, Liu and Chen (1998). Inequalities 
in this section refer to the canonical order for symmetric matrices, that is 
to say A> B (resp. A> B) if and only if ^ — i? is positive definite (resp. 
positive semidefinite) . 

First, it is preferable to compute any estimate before the selection step, 
since the immediate effect of the latter is a net increase in asymptotic vari- 
ance: Vt{ip) > Vt{(p) for any nonconstant function if. In this respect one may 
wonder why selection steps should be performed. We will see that the im- 
mediate degradation of the particle system is often largely compensated for 
by gains in precision in the future iterations. 

Second, residual sampling always outperforms multinomial resampling. 
Let : ^ M'^ and = (/5 - E^^{lp). Then 

Rtiif) = Rti'f) <E^Ar{vtW} <ya.TM, 

since r{x) < x. It follows from this inequality and (11) that Vl{(p) < Vt{(p). 
Actually, a substantial gain should be expected when using the residual 
scheme since the inequality above is clearly not sharp. 

Our central limit theorem also provides a formal justification for resorting 
to "marginalized" particle filters, as explained in the following section. 



10 



N. CHOPIN 



2.5. Marginalized particle filters. In some specific cases it is possible to 
decompose the density Trt{6t) into 7r™(^t)7rj (At |^t), with Ot = (6jAt) lying 
in Qt = 'Et X At, in such a way that it is possible to implement a par- 
ticle filter that targets the marginal densities vrj" rather than the vr^'s. 
When this occurs, this second algorithm usually produces more precise 
estimators (in a sense that we explain below) in the ^^-dimension. The 
idea of resorting to "marginalized" particle filters has been formalized by 
Doucet, Godsill and Andrieu (2000), and implemented in various settings 
by Chen and Liu (2000), Chopin (2001) and Andrieu and Doucet (2002), 
among others. 

Doucet, Godsill and Andrieu's (2000) justification for resorting to "marginal- 
ized" particle filters is that they yield importance weights with a smaller 
variance than their "unmarginalized" counterpart, which suggests that the 
produced estimates are also less variable. This is proven by a Rao-Blackwell 
decomposition, and, consequently, "marginalized" particle filters are some- 
times referred to as "Rao-Blackwellized" particle filters. We now extend the 
argument of these authors by proving that the asymptotic variance of any 
estimator is, indeed, smaller in the "marginalized" case. Assume decompo- 
sitions of TTt and TTt of the form 

where (^j. At) identifies to 9t, and vrj", vr^, vf™, vfj , are, respectively, marginal 
and conditional densities of ^t and At. Consider two particle filters, tracking, 
respectively, (vrt) and (vrj"). It is assumed that both filters implement the 
same selection scheme (whether multinomial or residual), and that their 
mutation steps consist in drawing, respectively, from kernels kt and A;™, 
which are such that the following probability measures coincide on Qt = 
St X At, 

/ 7rf_i(At_i|6-i)A;t{(6-i,At-i),(d6,dAt)}dAt-i 
^^^^ =kr{^t-udCt)f^'t{m)dXt, 

for almost every ^t-i in ^t-i- Note that in full generality it is not always 
possible to build a kernel from a given kt which satisfies this relation. As 
illustrated by the aforementioned references, however, it is feasible in some 
cases of interest. This equality implies, in particular, that 

Asymptotic variances and other quantities are distinguished similarly through 
the m-suffix for the marginal case, that is, Vt{(p) and Vt'^{ip), and so on. 
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Theorem 3. For any ip-.Et^R'^ such that (f , we have Vf^{ip) < 

Vt{(p) and Vf"^'^{(p) < V^^' ((/?). These inequalities are attained for a noncon- 
stant (p if and only if T^ti'l^t) = ■^t('lCi) for almost every S Ht, for any 
t>0. 

As suggested by the condition for equality above or more clearly exhibited 
in the proof in the Appendix, marginalizing allows for canceling the weight 
dispersion due to the discrepancy between conditional densities tt^ and vr^, 
while the part due to the discrepancy between marginal densities vr^ and vrj" 
remains identical. 

Beyond the small number of cases where this marginalization technique 
can be effectively carried out, this result has also strong qualitative impli- 
cations. In the following sections we will study the behavior of the time 
sequence Vt{^) in order to measure whether and at which rate a given parti- 
cle filter "diverges." In this respect, we will be able in some cases to build a 
marginalized particle filter whose rate of divergence is theoretically known, 
thus providing a lower bound for the actual rate of divergence of the con- 
sidered particle filter. 



3. Stability of particle filters. 

3.1. Sequential importance sampling. The sequential importance sam- 
pling algorithm is a particle filter that alternates mutation and correction 
steps, but does not perform any selection step. Weights are consequently not 
initialized to one at each iteration, and are rather updated through 

We suppress any notational dependence on H since it is meaningless in such 
a case. Due to its specific nature, this algorithm needs to be treated sepa- 
rately. Since particles are not resampled, they remain independent through 
iterations. It follows via the standard central limit theorem that 

where the corresponding asymptotic variance is 



and TTj denotes this time the generating distribution of particles 61 obtained 
by the recursion of mutation kernels kt{-,-), that is. 
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the distribution ttq being arbitrary. Sequential importance sampling is rarely 
an efficient algorithm, but the value of Vf^^{(p) can serve as a benchmark in 
some occasions, as we will see in the following. 

3.2. Sequential importance sampling and resampling in the fixed parame- 
ter case. In the fixed parameter case, that is, 0t = and iTt{0) = 71(6* |yi ; j), 
TTt is expected to become more and more informative on 9, and to eventually 
converge to a Dirac mass at some point 6q. Sequential importance sampling 
and resampling algorithms typically diverge in such a situation, since they 
generate once and for all the set of particle values from ttq, a majority of 
which are presumably far from ^o- The following result quantifies this de- 
generacy effect. 

Theorem 4. Let ip:Q—f W^, 99 S ^['^^ . Then under regularity conditions 
given in the Appendix, there exist positive constants ci , C2 and C3 such that 

\\Vr{^)\\>,CitP/'-\ \\V;{^)\\>^C2tP/', IIVtMII XC3tP/2, 

as t goes toward infinity, where \\ ■ \\ denotes the Euclidean norm, p is the 
dimension of Q and Vl{ip), Vt{ip) refer here to the sequential importance 
resampling case, that is, kt{6,-) = 5g. 

The conditions mentioned above amount to assuming that irt is the poste- 
rior density of a model regular enough to ensure the existence and asymptotic 
normality of the maximum likelihood estimator. Under such conditions, vr^ 
can be approximated at first order as a Gaussian distribution centered at 
^0 with variance I{6o)~^/t, where I{6o) is the Fisher information matrix 
evaluated at ^o- The results above are then derived through the Laplace 
approximation of integrals; see the Appendix. At first glance, it seems para- 
doxical that V^^^{lp) converges to zero when p=l. Note, however, that the 
ratio Vt((^)/Var^j ((/?), which measures the precision of the algorithm relative 
to the variation of the considered function, is likely to diverge even when 
p = l, since typically Var^j((y9) x I{9q)~^ /t as t +00. 

That the sequential importance resampling algorithm diverges more quickly 
than the sequential importance sampling algorithm in this context is unsur- 
prising: when particles are not mutated, the only effect of a selection step is 
to deplete the particle system. In this respect, we have for any nonconstant 
function ^p, 

vr{^)<v;{^)<Vtiip). 

The proof of this inequality is straightforward. 

Due to its facility of implementation and the results above, it may be rec- 
ommended to use the sequential importance sampling algorithm for studying 
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short series of observations, provided that the dimension of Q is low. But, in 
general, one should rather implement a more elaborate particle filter which 
includes mutation steps in order to counter the particle depletion. A further 
implication of these results is the following. Consider a dynamical model 
which involves a fixed parameter 0, and assume that the marginal posterior 
distributions 7r{9\yi-t), obtained by marginalizing out latent variables xi:t, 
satisfy the regularity conditions of Theorem 4. Then, following the argument 
developed in Section 2.5, we get that the rate of divergence of the sequential 
importance resampling algorithm for this kind of model is at least of order 
0(tP/2)^ where p is the dimension of this fixed parameter. 

3.3. Sequential importance sampling and resampling for Bayesian filtering 
and smoothing. For simplicity we assume that 7rt(xi:t) = 'K{xi-t\yi:t) is the 
posterior density of a state space model with latent Markov process (xt), 
xt^ X, and observed process {yt)-, yt ^y, which satisfies the equations 

yt\xt ~ f{yt\xt)dyt, 

xt\xt-i g{xt\xt-i)dxt. 

We distinguish two types of functions: those which are defined on common 
dimensions of the spaces 0f = say, (p:xi-t^ ^{^k)^ t^k, and those 
which are evaluated on the "last" dimension of @t, that is, ip:xi:t^ ^(xt)- 
Evaluating these two types of functions amounts to, respectively, "smooth- 
ing" or "filtering" the states. 

The sequential importance sampling algorithm is usually very inefficient 
in such a context, whether for smoothing or filtering the states. We illus- 
trate this phenomenon by a simple example. Assume the tth mutation step 
consists of drawing xt from the prior conditional density g{xt\xt-i), which is 
usually easy to implement. Consider two evolving particles 9^ = x^.^ with 
weights wl , j = 1,2. We have 

(2) Z^^"S (2).- 

Wi k=i j{yk\xl ') 

Assuming that the joint process {yt,x^/'\x^^^) is stationary, the sum above 
typically satisfies some central limit theorem of the form 

(13) t-'/'i:^ogi^^^^^M{oy), 

k=i fiVklxl') 

where the limiting distribution is centered for symmetry reasons. Note that 
this convergence is with respect to the joint probability space of the simu- 
lated processes x[''\ j = 1,2 and the observation process {yt), while all our 
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previous results were for a given sequence of observations. In this way, (13) 
yields that the ratio of weights of the two particles either converges or di- 
verges exponentially fast. More generally, when H particles are generated 
initially, very few of them will have a prominent weight after some itera- 
tions, thus leading to very unreliable estimates, whether for smoothing or 
filtering the states. The algorithm suffers from the curse of dimensionality, 
in that its degeneracy grows exponentially with the dimension of the space 
of interest Gj. 

We now turn to the sequential importance resampling algorithm, and 
remark first that, for (/? : xi ; t — > V'(xi) and t > 0, 

vt{^)>v{{^)>vr{^), 

provided ip is not constant. The proof of this inequality is straightforward. 
The sequential importance resampling algorithm is even more inefficient 
than the sequential importance sampling algorithm in smoothing the first 
state xi, because the successive selection steps only worsen the deterioration 
of the particle system in the x\ dimension. This is consistent with our claim 
in Section 2.4 that a selection step always degrades the inference on past 
and current states, but may possibly improve the inference on future states. 
In this respect, the algorithm is expected to show more capability in filtering 
the states, and we now turn to the study of the filtering stability. 

The functional operator 8t which appears in the expression for Vt{ip)., see 
(9), summarizes two antagonistic effects: on one hand, the weight distortion 
due to the correction step, and, on the other hand, the rejuvenation of 
particles due to the application of the kernel kt. Stability will be achieved 
provided that these two effects compensate in some way. 

For simplicity, we assume that the state space X is included in the real 
line and that the studied filtering function ip:xi-t^ vi^t) is real-valued. 
Recall that for the sequential importance resampling algorithm, kt is given 
by 

kt{xl.f_i,dxi;t) = 5^*^^_^qt{xt\xl.^_^)dxt, 

for some given conditional probability density qt{-\-)- We assume that qt 
only depends on the previous state xt-i, and, therefore, defines a Markov 
transition. The ability of qt to "forget the past" is usually expressed through 
its contraction coefficient [see Dobrushin (1956)] 

pt = ^ sup \\qt{-\x') - qt{-\x")\\i, 

x',x"€X 

where || • ||i stands for the Li-norm. Note pt < 1, and if pt < 1, qt is said to 
be strictly contractive. Define the variation of a given function f by 

Aip = sup \ip{x) — ip{x')\. 
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Then the coefficient pt measures the extent to which the application qt "con- 
tracts" the variation of the considered function, that is, for any x' ,x" G X, 



(14) 



qt{x\x')ip{x) dx — I qt{x\x")ip{x) dx 



Furthermore, it is known [Dobrushin (1956)] that if qt is such that, for all 



qtix\x') 

qtix\x") 



<C, 



then its contraction coefficient satisfies pt<^ — C^^. We therefore make 
such assumptions in order to prove the stability of the sequential importance 
resampling algorithm. 



Theorem 5. Assume that Aip < +oo and there exist constants C , f 
and f such that, for any t>0, x,x',x" G X, y £y, 



(15) 



g{x\x') 



(x\x" 



<C, 



9*(^I^<C, 0</</(y|x)</. 



qt{x\x") 

Then Vt{ip) is bounded from above in t (in the sequential importance resam- 
pling case). 



This theorem is akin to previous results in the literature [see Del Moral and Guionnet 
(2001), Le Gland and Oudjane (2004) and most especially, Kiinsch (2001, 
2003)], except that these authors rather consider the stability of some dis- 
tance (such as the total variation norm of the difference) between the "true" 
filtering density iTt{xt) and the empirical density computed from the parti- 
cle system. In fact, Del Moral and Miclo [(2000), page 36] proved that the 
actual variance of the Monte Carlo error is bounded from above over time 
under similar conditions. Unfortunately, all these results, including ours, re- 
quire strong assumptions, such as (15), that are unrealistic when is not 
compact. Further research will hopefully provide weaker assumptions, but 
this may prove an especially arduous problem. 

3.4. Resample-move algorithms, variance estimation. Following Gilks and Berzuini 
(2001), we term "resample-move algorithm" any particle filter algorithm 
which includes an MCMC step in order to reduce degeneracy, as described 
in Section 2.2. It seems difficult to make general statements about such 
algorithms and we will rather make informal comments. 

The fixed parameter case is especially well behaved. Basic particle filters 
diverge only at a polynomial rate, as seen in Section 3.2, in contrast with 
the exponential rate for state-space models. Adding (well-calibrated) MCMC 
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mutation steps should, consequently, lead to stable algorithms in many cases 
of interest. In fact, it is doubtful that a mutation step must be performed 
at each iteration to achieve stability. Chopin (2002) argues and provides 
some experimental evidence that it may be sufficient to perform move steps 
at a logarithmic rate, that is, the nth move step should occur at iteration 
tn ~ exp(an). 

Situations where a latent process intervenes seem less promising. Smooth- 
ing the states is especially a difficult problem, and we do not think that there 
is any solution for circumventing the curse of dimensionality that we have 
pointed out in the previous section. Even if mutation steps are performed at 
every iteration, the MCMC transition kernels should themselves suffer from 
the curse of dimensionality, in that their ability to rejuvenate particles of 
dimension t is likely to decrease with t. 

Resample-move algorithms remain an interesting alternative when the 
considered dynamical model includes a fixed parameter 9. MCMC mutation 
steps should avoid depletion in simulated values of 9, and make it possible 
at least to filter the states and estimate the parameter under reasonable 
periods of time. Unfortunately, the corresponding MCMC transition kernels 
will often depend on the whole past trajectory, so that long term stability 
remains uncertain. 

In such complicated setups it is necessary to monitor at least numeri- 
cally the degeneracy of the considered particle filter algorithm. We propose 
the following method. Run k, say k = 10, parallel independent particle fil- 
ters of size H. For any quantity to be estimated, compute the average of 
the k corresponding estimates. This new estimator is clearly consistent and 
asymptotically normal. Moreover, the computational cost of this strategy 
is identical to that of a single particle filter of size kH, while the obtained 
precision will be also of the same order of magnitude in both cases, that is to 
say {Vt{ip) / {kH)}^/"^ . This method does not, therefore, incur an unnecessary 
computational load, and allows for assessing the stability of the algorithm 
through the evolution of the empirical variance of these k estimates. 

APPENDIX 

A.l. Proofs of Theorems 1 and 2. We start by outlining some basic 
properties of the sets ^['^'^ with respect to linearity. The set <I>^'^^ is stable 

through linear transformations, that is, if € ^f^^ =^ Mip € ^ if M is a 
d' X d matrix of real numbers. In particular, if the vector function ip = 
(ifi ,ipiiy belongs to , then each of its coordinates belongs to ^[^^ . The 
converse proposition is also true. Finally, we have Vt{M(p + A) = MVt{(p)M' 
for any constant A S W^, and this relation also holds for the operators Vt and 
Vt- Proving these statements is not difficult and is left to the reader. 
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The proof works by induction with Lemmas A.1-A.3 for Theorem 1, and 
Lemmas A.l, A. 2 and A. 4 for Theorem 2. The inductive hypothesis is the 
fohowing. For a given t > 0, it is assumed that for ah Lp G 

(16) i/^/^j-ifj^^cCf ) -IE..-.(9^)| ^A/-{0,y,_i(v.)}. 
Lemma A.l (Mutation). Under the inductive hypothesis, we have 

H''' I ^ jl ) - E^, (^) I ^ A/-{o, vt m 

for any measurable ip -.Qt^W^ such that the function fi : 0^-1'-^ '^kt{9t-i,-)^(') ~ 
^nti'^)} belongs to and there exists 6 >0 such that E^i-J|^|p+'^ < +oo. 

Proof. We assume that ip is real- valued {d = 1). The generalization 
to d > 1 follows directly from the Cramer- Wold theorem and the linearity 
properties stated above. 

Let V; = ^P-E^M, K^t-i) = Efc,(e,_,,){^(-)}, ^^Ot^^) = Varfc,(e,_,,.){^(-)} 
and (7q = E7rj_i((7^). We have ^^^^^^^{fi) = 0, and by Jensen's inequality, 

a2 = E^,_jVar,^(e^_,,.){^(.)}]<E^,_jE,^(,^_,,.){^(f }] 

<{E,jVl^'+')}'/^'+'^<+oo, 

which makes it possible to apply the law of large numbers for particle filters 
to cr^, 

H 

(17) H~^Y1 ^^(^J-f ^) ^ ^0 almost surely. 
Defining 

(18) z.(0t-i) = Efc,(,^_^,.){|V;(-) - K9t-i)\'+'} 

(19) < 2i+^{E,,_^(,,_,,)|v;(-)P+' + \E,^_,^e._^,)m'^'} 

(20) <2'+'{E,^_,^e,_^,)\m'^'}^ 

where (19) comes from the Cr inequality and (20) from Jensen's inequality, 
we deduce that 

E.,_,(i^)<22+%JVl'+^<+cx). 
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This inequality ensures that the expectations defining v in (18) (and, sim- 
ilarly, those defining ii and cr^) are finite for almost every dt-\- It follows 
that 

H 

H-^ i^iel^i^) E^,_, (z^) almost surely, 
i=i 

and combining this result with (17), we obtain the almost sure convergence 
of 

PH - 



(21) 



0. 



Let Th = H SjLi V'(^i"''^^)> St-i denote the sigma-field generated by 

the random variables forming the triangular array {9^1i^)j<H-, that is, the 
particle system at time f — 1, and pn = ^{TH\St-i)- Conditional on 5t^i, 
the '4}{6[^'^^ys form a triangular array of independent variables which satisfy 
the Liapunov condition, see (21), and have variances whose mean converges 
to Uq, see (17). Therefore [Billingsley (1995), page 362], the following central 
limit theorem for triangular arrays of independent variables holds: 

(22) {TH-m)\St-i^M{Q,al). 

Since E^j_^(;u) = and /i € we have also, by applying (16) to the 

function /i, 

(23) ^^H = H-^'^ M(^£f ) - A/-{0, T4-i(m)}. 

The characteristic function of Th is 

$T«(n)=E{exp(mrH)} 

= E[exp(iu^//)E{exp(iMrH' — iu^H)\St-i\\-, 

where E{exp(«uTH' — iu^iH)\St-i} is the characteristic function of Th — Hh 
conditional on St-i, which according to (22) converges to exp(— (7oit^/2). It 
follows from (23) that 

exp(zu/ij|/)E{exp(iiiTH' — iufiH)\St~i} exp(— cjgti^/2 + iuZ), 

where Z is a random variable distributed according to A/'{0, Vi_i(/i)}. The 
expectation of the left-hand side term converges to the expectation of the 
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right-hand side term foUowing the dominated convergence theorem, and this 
completes the proof. □ 

Lemma A. 2 (Correction). Let ip € ^t^\ assume the inductive hypothesis 
holds and the function 6t'—>-l belongs to ^[^^ . Then 



Proof. Let ip = ip — Et^j (93). For notational convenience we assume that 
d=l, but the generahzation to d > 1 is straightforward. It is clear that the 
vector function ip = [vf (p, vt)' fulfills the conditions mentioned in Lemma A.l, 
and as such satisfies 



Then, resorting to the (5-method with function g{x,y) =x/y, we obtain 



where V = {{dg/dx, dg/dy)iO, l)}Vt iiP){{dg/dx, dg/dy){0, 1)}' = Vt{vt ■ {ip- 



E7rt<^)}. The left-hand side term is unchanged if we replace the vt{6i ' )'s 
by the weights w\^'^\ since they are proportional. □ 

Lemma A. 3 (Selection, multinomial resampling). Let Vtiip) = Vt{ip) + 
VarTTj {'p) and assume the particle system is resampled according to the multi- 
nomial scheme. Then, under the same conditions as in Lemma A. 2, 



Proof. The proof is similar to that of Lemma A.l. Assume d = 1, de- 
note by St the sigma-field generated by the random variables {9[^'^\w[^'^^)j<H 



and let = - Th = H~^I^Y.'!=MO\''"^) and = nTH\St). 



Conditional on St, Th is, up to a factor a sum of independent draws 

from the multinomial distribution which produces (p{9\^'^^) with probability 
w\^'^^ /Y^^^iw\^'^\ Then, as in Lemma A.l, we have 






A/'(0,V) 




{TH-^lH)\St^M{{),al) 
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where this time ctq = Var^rj (if), which is the hmit as H ^ +00 of the variance 
of the multinomial distribution mentioned above. The proof is completed 
along the same lines as in Lemma A.l. □ 

Lemma A. 4 (Selection, residual resampling). Let Vt{f) take the value 
given by (7) and assume the particle system is resampled according to the 
residual resampling scheme. Then, under the same conditions as in Lemma A.2, 

H'" { ]i E ^i^t''""^ ) - IE.. (^) I - AA{0, Vt {^)]. 

Proof. The proof is identical to that of Lemma A.2, except that condi- 
tional on St, Th is times a constant, plus a sum of independent draws 
from the multinomial distribution described in Section 2.1. This yields a dif- 
ferent value for (Tq, 

In addition, we also have to make sure that the number of these inde- 
pendent draws LL^ tends toward infinity. In fact, /H — > Efi-Jr(ft)]. To see 
this, consider 

H H 

where Hpj = vt{e[^'^^) / {R-^ J2j M&i^'"^)}, see Section 2.1, so that the dif- 
ference above should eventually be zero as LL~^J2j'^tidi'^'^^) 1- More 
precisely, we have |r(x) — r{y)\ < 1, in general, and r{x) — r{y) = x — y 
provided \x — y\ < e and r{x) E [e, 1 — e] for any e < 1/2. Therefore, as- 
suming that {H~^ Y.j yM^'^^y^ e[l-e',l + e'] for some e' > and H 
large enough, we get that the sum above should be zero plus something 
bounded from above by the proportion of particles such that s'vt{-) > 1/2 
or r{vt{-)} ^ [e'vt{-),l — e'vt{-)]. This proportion can be made as small as 
necessary. □ 

A.2. Proof of Theorem 3. Let : — > M*^ and (p = — E^rt ((^5) = 'P — 
E^™((^) for a given i > 0. To simplify notation, it is assumed that d= 1, 
but the adaptation to the general case is straightforward. All quantities 
related to the "marginalized" particle filter are distinguished by the m-suffix. 
For instance, £t^{<p) stands for the function 1— > E^^m^^^ in 
agreement with the definition of Sti'p) in (10). In this respect, the marginal 
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weight function v^{-) is 7r™(")/^t"(')) if we define the "conditional" 
weight function vf{Xt\S,t) = iTf{Xt\6.t)/T^t{'^t\S,t), we have the identity 

It follows from (12) that 

since ¥jj^^{v'[) = 1, and by induction, we show similarly, for k <t, that 
Hence, for k <t, 

by Jensen's inequality. From the closed form (9) of Vt{ip), we deduce the 
inequality V^{(p) < Vt{(p) for the case when the selection step follows the 
multinomial scheme. Alternatively, if the selection step consists of residual 
resampling, let = — Es({r(i;f)(^}/Es({r(t;t)}. Then 

""t y t ) 

>E^Y'[{Kicr{vt)-rivr)}ip% 

and since E,jic(^vt) =v'^, we have E^fc [t;^J < [v^\, hence Kj^cr^vt) > 

and, consequently, Rtif) > RTif) "-P- is then easy to show by 

induction that the desired inequality is also verified in the residual case. 

A.3. Regularity conditions and proof of Theorem 4. Let 'Kq{6) denote 
the prior density and p{yi :t\&) the likelihood of the t first observations, so 
that through Bayes formula, 

n{e) = 'n{e\yi..t)^MO)p{yi:t\e). 

Let lt{0) = \ogp{yi-t\0). The following statements are assumed to hold al- 
most surely: 

1. The maximum Ot of lt{0) exists and converges as t — > +00 to 6q such that 
7ro(6'o) > and 7fo(6'o) > 0. 

2. The matrix 

t 8989' 

is positive definite and converges to I{9o), the Fisher information matrix 
at 9q. 
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3. There exists A > such that 

1 



< (5 < A =^ hmsup 

t— >+oo 



sup {k{e)-hiet)} 



* \\e-et\\>5 



<0. 



4. The functions '7ro(0) and lt{0) are six-times continuously differ entiable, 
the partial derivatives of order six of lt{9)/t are bounded on any compact 
set ©' C 0, and the bound does not depend on t and the observations. 

5. : ^ M'^ is six-times continuously differentiable, '^'{Oq) ^ 0. 

For convenience, we start with the one-dimensional case {p = 1). The 
Laplace approximation of an integral [see, e.g., Tierney, Kass and Kadane 
(1989)] is 

i){e)e^^{-th{e)]de 

= (27r/t)^/2^exp{-t/i} 

where hats on ip, h and their derivatives indicate evaluation at the point 
which minimizes h, and a = —{l/h")^/"^. This approximation remains valid 
for a function ht depending on t, provided that the fluctuations of ht or 
its derivatives can be controlled in some way. Conditions above allow, for 
instance, for applying this approximation to the functions = —lt{9)/t 

and ht^2{9) = — 2/t(0)/t; see Schervish [(1995), page 446] for technical details. 
It is necessary, however, to assume that ■0 (6*0) 7^ 0, so that is either strictly 
positive or strictly negative at least in a neighborhood of ^o- Since Vl^^{^p) = 
V^^^{(p + A) for any A € M, we assume without loss of generality that ^{Oq) 7^ 
0. Vf^^{ip) equals 

/ v^i {e)piyi :t\efde- 2e^, {^p) j ^2 {e)p{yi -.tW de 
U7^{e)p{y^..t\e)deY 

(24) 

U^{e)p{y,.,,\e)deY ' 

where = tio{0?^{9? /^o{e). = and ^3 = MO)VMO). 

Combining the appropriate Laplace approximations, we get that 

tV2 



2(^St)V2 

{no{9t) + Bt~^ + Oit-W 
2(^Sj)i/2 no{et){l + B7ro{9tyH~^+0{t~^)}^ 
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where A is the sum of 0{t ^) terms corresponding to the three Laplace 
expansions of the numerator, and B is the 0{t~^) term of the denom- 
inator. Since tpiOt) - E^,{ip) = 0{t-^), St = /(^o) + 0(i"^) and ^(^t) = 
V'(^o) + 0{t~^) for any continuous function ip, we get through appropriate 
derivations that 

27rV^7ro(6'o) 

Derivations in multidimensional cases are much the same, except that no- 
tation is more cumbersome. When p > 1, the factor in the Laplace 
expansion is replaced by t"^/^, so that in the ratio (24) we get a factor t^/^, 
and since the t^^^ terms cancel as in the one-dimensional case, the actual 
rate of divergence is t^/^"^, and this completes the first part of the proof. 

In the sequential importance resampling case (multinomial scheme), qt{0,-) 
5g and vff = vrt-i, and according to (9), 

(25) Vti^) = Vr{^) + j2^ ^ 



fc=l 



-{ip-E^M} 



Then through a direct adaptation of expansions above we obtain a diver- 
gence rate for Vt{ip) of order (Efc=o(* " ky/'^"'^) = 0{tP/'^). For the residual 
case, it follows from (11) and (25) that 



-{(/.-E^,(^)} 



The difficulty in this case is that the noncontinuous function r(-) takes part 
in the expression for Rki'), see (8). It is clear, however, that the Laplace 
expansion can be generalized to cases where regularity conditions for the 
likelihood and other functions are fulfilled only locally around ^o- The addi- 
tional assumption that 7rt{6o)/7rt-i{9o) is not an integer for any t > allows 
r{vt) to be six-times continuously differentiable in a neighborhood around 
9o, and, therefore, makes it possible to expand the terms of the sum above, 
which leads to a rate of divergence of order 0(t^'/^) in the same way as in 
the multinomial case. 

A. 4. Proof of Theorem 5. As a preliminary, we state without proof the 
following inequality. Let 99, •0 : M — > M such that > 0, supV' > and inf ip < 
0. Then 

(26) A(v9'0) <supv3- AV'. 

Due to particular cancelations, the weight function vt{xi:t) only depends 
on xt-i and xt in the state space case 

rn,rys I \ ( \ f{yMt)g{xt\xt-i) 

(27) vt{xi;t) = vt{xt-i,xt)cc -. — . ^ . 

qt[xt\xt-i) 
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Straightforward consequences of this expression are the identities 

qt{xt\xt-i)vt{xt-i,xt) 



(28) 7rt{xt\xt-i) 

(29) 7rt+i{xt+i\xk) 



J qt{x\xt^i)vt{xt-i,x) dx' 

J 7rt{xt\xk)qt+iixt+i\xt)vt+i{xt,xt+i) dxt 



jTTt{xt\xk)qt+i{x\xt)vt+i{xt,x)dxtdx 

for k <t, where 7rt{xt\xk) denotes the conditional posterior density of xt 
given Xk and the t first observations, that is, TTt{xt\xk) = Tr{xt\xk,yi:t) = 
'iT{xt\xk,yk+i:t)- We start by proving some useful lemmas. 

Lemma A. 5. The conditional posterior density TTt{xt\xk), k <t, defines 
a Markov transition from Xk to xt whose contraction coefficient is less than 
or equal to (1 - C"^)*"''. 



Proof. This is adapted from Kiinsch (2001). For x^, x^,, x^+i ^ X ,k <t, 



T:t{xk+i\xk) _ g{xk+i\xk)p{yk+i:t\ Xk) ^ r<2 



7rt(xfc+i |x'^) g{xk+i \x'^?jp{yk+i ■.t\xk 
since g{xk+i\xk) < Cg{xk+i\x'^) and 

p{yk+i:t\x'k) = / g{xk+i\x'k)p{yk+i:t\xk+i)dxk+i 



<C j g{xk+i\xk)p{yk+i:t\xk+i)dxk+i. 

Therefore, the contraction coefficients of Markov transitions TTt{xk+i\xk) and 
T^t{xt\xk) are less than or equal to, respectively, (1 — C"^) and (1 — C~^)*~''. 
□ 

Lemma A. 6. Let X be a probability density on X and h[x\x') a condi- 
tional probability density defining a Markov transition on X . Then for any 

x' ex,yey, 

J f{y\x)h{x\x')dx 
E^t^.^^){If{y\x)h{x\x")dx}- ^' 

where ph is the contraction coefficient of h{-\-), and Cj = f/f — 1. 

Proof. It follows from the definition of ph [see (14)] that for x',x" G X, 
f{y\x)h{x\x')dx- j f{y\x)h{x\x")dx < ph{f - f) 
and therefore, 

sup |y'/(y|x)/i(x|x')dx| <E;,(,,,,)|y'/(y|x)/i(x|x'')tix| +/>;,(/-/), 
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SO that 

s^Px'exU f{y\x)h{x\x') dx} (/-/) 

^x(x"){I f{y\x)h{x\x")dx} ~ Ex(r,»){J f{y\x)h{x\x")dx} 

<i+P.(|-i). 

Lemma A. 7. Let p=l- and p2 = l- C"^. Then for k <t, 

t~k 

1=1 

for any real-valued filtering function, ip:xi-t^ fi^t)- 

Proof. Let (p = ip — Ettj (</?). Note the arguments of £k+i:t{^) are xi- k 
in general, but in the case considered in Section 3.3 it only depends on 
Xk and is therefore treated as a function X ^ X . For the sake of clarity, we 
treat the case k = t — 2, but the reasoning is easily generalized. The following 
decomposition is deduced from identity (28): 

= '^qt-i{xt-i\xt_2){^t-ii^t-2,Xt-l)£t{^){xt-l)} 

= IEg,_i(^,_i|^,_2){i;t-i(xt_2,a;j_i)}E^^_^(^^_^l^^_2){£'t((^)(xt„i)}. 
It follows from (27) that the first term satisfies 

^qt-i{xt-.i\xt-2){^t-^i^t-'2^^t-^)} °^ J f{yt-l\xt-l)g{xt-l\xt-2)dXt-l, 

where the proportionality constant can be retrieved by remarking that the 
expectation of this term with respect to 7rt_2 equals one and, therefore, 

^qt.i{xt-.i\xt-2){^t-l{xt-2,Xt-l)} 

_ J fiyt-l\xt-l)gixt-l \xt-2) dXt-1 

^7Tt-2{xt-2)Uf(yt~l\^t~l)gixt-l\xt-2)dxt-l} 

<l + pCf 

according to Lemma A. 6. Note 'Kt-2{xt-2) denotes the 7rt_2-marginal density 
of xt-2- It follows from the decomposition above and the inequality in (26) 
that 

A£t-l:t{^)<{l+pCf)A^, 

where tp is the function 

1p{xt-2) = E^^_^(^^_^|^^_2){£'t((^)(xt_i)} 

= '^^t~^{xt-i\xt-2)[^qt{xt\xt-i){'>Jt{xt-l,Xt)<:p{xt)}]. 
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Note that does take positive and negative values, since the expectation 
of £t-i:t{'f) with respect to ^1-2 is nuh. We now decompose ip in the same 
way, 

by consequence of the identity (29). The expectation of the first term with 
respect to 7rt-i{xt-2) equals one, so that 

_ jTrt^i{xt-i\xt--2)fiyt\xt)gixt\xt-i)dxt^idxt 

^7Tt-i{xt-2){I'^t-i{xt-i\xt~2)fiyt\xt)g{xt\xt^i)dxt-idxt} 

<l + pp2Cf, 

according to Lemmas A. 5 and A. 6. Resorting again to inequality (26), we 
get 

AV'< (l + pp2C/)p^A(/j, 

which leads to the desired inequality, and this completes the proof of Lemma A. 7. 
□ 

To conclude the proof of Theorem 5, remark that E^j-j,(i;/c) = 1. Therefore, 

f{yk\xk)g{xk\xk-i)/qk{xk\xk-i) 



Vk{Xk-l,Xk) 



^nk{xi..k){f(yk\Xk)g{Xk\Xk-l)/qkiXk\Xk-l)} 

< c^f/f, 



and since the expectation of the function Sk+i:t{f — ^irtif)} with respect 
to TTfc is null, the function Sk+i - tW — ^wti^)} is ensured to take positive and 
negative values, so that 

sup \£k+l:t{^-'\^7Tt{^)}{Xk)\ < A£k+l:tW -E^^{ip)} 
Xis£X 

and, finally, 

[vl£k+l : tW - ^TTt (V^)}^] 

1=1 

/ t~k 



< C^/7/)^exp 2pCfj:Pl-' pr'\^^)' 



i=l / 



< CV/ir exp{2pC;/(l - p2)}pr-^'{A^y 
It follows from (9) that Vt{ip) is bounded from above by a convergent series. 
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